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y # - % y h b % % t- * ^ - X £ L T tl&lf tlli & $ -f . £ fe . b-fe 7° £ X f± 

H^^-y-y h t <t&¥a £<DM file & ft %%KW% — t><OV It VtilStt 

© £ , A <y Z f => 7 y K Hz * -> 3 > fi * it L T ©5 o MtLT, 3 3>#-^>FA^i 

^cj£irjB©ibi^fe-fe-yr-£iijRir3 0 g t> p a 14 © © fit M . a <*tc o © t & i& m 

Mt>VX & >f?MM{cfo©T, M&co©T«iw&l*fSSUcfo©-c\ c © «fc o ft it £■ 

^-^<Dfei6ic3iiR*n*iii»**a^b^«j©-*JS)g|gcti, mis ©us, issk^x 

ttiJ«ffHi**'J7L**ofc*», MffittftgMKXtigMttai^ ft tf Hr 7° 
* *££-7 -fe -r* (c ffl © 6 ti § *f M ft if # $ ft 5 c 
[ 0 0 2 1 ] 

LCi^sf-M-xi^ ft^x k 6 '# e> ft § x * y - > y 7 s - * * jb © x mm? 20 
RSWtf&3o fg-fc, Xitt#!Sfc: «k o Tf# 6 ft3 © (i . -Wic, # Vf- << 7*ft1ft m 

(m * a , m © it % t m © u -tr f $ t © *g £ © pi s e o © t © i/ # - h ) as # t? & 

o o W ffl =6: iff fx © it «t if -5 } j , # f" 7* x — £ & -r -f 7 t ' — £ £ [W] ii {c M 

it, hznvm^^&iftvii&mm-v z K^nz 5 . i-k, * n ft a su © , 1 © ft ■& w 
© i ©nr^^KW-rsie^tt-r-^fca-rsis^o^swp* - ffi©{b£$/©n 

£ ft § 7c » s it tg -T 5 C # Tf t ft © o Lfe#oT, 71/ - 7° 4 © 3 3 > ^ - ^ > h x - 
^^-7.^r^^-r^fc:i6©-^/l»te(i, EilO l/-b7^Xligl^-y7 Hcot,>T 30 

[ 0 0 2 2 ] 

*f|B^{i, ^^W^F^H(cM)lL/c4t;S14^43]P,nT^§{b^ ! l%^ l<S©3^^-^> 
ti -f - 7* ;l/ tc -S- tr c i: ^ "C % 0 

[ 0 0 2 3 ] 

(a) *^©{b^^^>b^7^Xtt^*ftif i lt^©»^^-y-yhi:©P B , 3©ffia^ffl 40 

an <t ^ «s © ey k nmit^yat u-t? z x&mmtt^&ft 1 ?- $-v ? v t<oWi<o^^ 
ttmznmttif z tctbicm^zit-ervstf^ttiZo mottwmfcit, s i g ma a i d 
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c. (rb i) © a * n y , % (omco^MRicto ztifcmmMfr bmiRz nrcit^vi 
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K«]©* ! ^T'^?,{b-& ! l%o C n €> ©fb^ft tt «k < » 6 tlT fo' «3 , iHi4f n |Ip D og (FD 50 
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ntion I n c . ) IC <t 5 U S P D I S e r i e st'feD, lO^O, 6 
S?il#©fci>)©^ft1i$B (Volume I. Drug Information fo 
r the Health Care Professional) (i , USP Dl Upd 
a t efc«fc9$J!MSr3ttT^5o IS JEW Rl $ ft fc 9r W fat , J'J-tlt* C i 
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(c) •fUfcaSJii&fctffcSfcl/TIND ( ftlft *g:investigationa 

1 New Drug) Xt-^ enf;*', FDA^50gnI*#Sfei60BB*H 

coAfd'J-oft^tttt, F D A fcH^igpjS ftfctf, ttlci1i«^6*aitfcJ: 5ft 
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[ 0 0 2 5 ] 

0 1 Ali, iJI/-->3t^f - X©ffcfrftr- — 7^3 OO^it, f-7;V3 0 0 

tcttttttofb-&«*« y x f £ ftrfc o > jM©fb&ft n ©ta^i (ft 1 n ) 

o & fb ft o ^ T , K{k£-ft£II8-f51«?8£&tf?i]30l~307 #fc§, W *. tf , 0 

lAtfe^t, 3 o i « fb £ ft £ £ , m 3 o 2 (* fb & ft © a m (m*.i£. a » t? © & » 

£ fF £ ft fc fb £ ft £ fc ) * , 5»J 3 0 3 f± \t ¥ W « 3g £ H 1" § 'If $g , ffil * tf , 18 iS 0 * 
tf X >7 U — > £ uf CP HJ f fc 46 © -Y ^ - 'J > ^ (Hi B © X t y 7 ^> a >y F 3 1 0§#i 
) 9»j 3 o 4 ttfb^ft©fb^S%, 9J 3 o 5 (ifb^ft©fta<b^^ ! lf ttfc Mt «1t«« 

, W 3 0 6 tifb^ft©^P E Siafi^, 5»j307ti{b^ft©Rj?Stttc§8-rSlf«*^A,1?^S 40 

0 

[ 0 0 2 6 ] 

x-^;P3 0 OtC'JX F^ftfc^fb^-ft3 0 i tl5t5^<Di<0Mif-^4tftSC fc 
T* £ 3 cfc 5 tc , *6fc5U*iBiol/T t c ft 6 ©ii*n^ftfc?'Jtcfb^-ft(D4ftjgtt 
^#A6i>il fctcj;0, fb^ft-r'-^^-X^: 2 n f©f-^^-Xtt 5 c 
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[ 0 0 2 7 ] 

0 1 Bit. f-7A 3 0 0'fOEgtHltSi«*t«J^t'^'>3'y F3 1 0^^"T 
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ft T Z - V >y V u y # - * y V : b -fe 7 ? , mm, j <D ttKD ft ± $ - y v V (DWUJkZf ft 

iffinmcmmrz^-te-fz, mm, *<Di\k<oz y^tn, mm, ktmlbk ^(D^nrnft 

y«y hT**5o Hr/^tt- ft rt <DfllB3 S fg t fc 5 # ^ U ^ ;l <D n l= a =. >r - is a 
y<D±^ft *«ft"T § o SfHUi. «Atf, 2 * W ^ 7 -b > + X r A R II ^' t ^ 

[ 0 0 2 9 ] 10 

U -fe :/ * ti , K-'<3>U-fe:/*, ta^y^^, 7 'n > gij b -fe y * , A X yj U 

s^tJns. cne»ou-try^g¥tcti, u*y**>ry0-9-y*>fy(K-/^:/-i 

> 4 . 4StfK-/U>4. 742) *SL, X»S4**tt ( K-/<5>2 shor 
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iurT"^ inry^i©^ia*^5 0 mef©DNA->-^>x(D £ ti 14 © s 

fiCiot, Hr7**#£-fSt£ £ fc > 7S/I?>'-y>X&a ; ;ititcH 

m "9" 5 3 ^ ft ffi (4 t <£ 1 X & , \s -b y > >> m f" S £ i: if* T* % & 0 m $K Pi <D U -tz "/ % tB ±i 
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, F7>X7i7--b\ P 4 5 0 , *©ffl©SH#fc»6tlfciPiBtf-&i:n5o 
[ 0 0 3 1 ] 
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> f-^^_Xn>fy7(i:]ifipLt^§„ iitaJn§l/t7°^RO*H:7 p y7y(!^tt 
, ^H#K <fc < ftlBtifc fe©-p*5„ SrlS^M&tflB^tBiii-r § U-fe y* EO^TO U 
X h & If fE 3iE fct , S H # fc ftl S ft T ^ -5 # < © ffl ft 5> f# 5 C £ # T* t -5 o C ft 6 © jffl 
ff * t tt » RBI l/t7*?»lA>F77^ (RBI Handbook of Rece 
ptor CI ass i f i cat i on) &ff I U P H A R U-b 7^»S« (1UPH 
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, SrM<Dl/fcy*fttf Ufcy*-0-y*yy#fSj|;*ft3 t , c ft 6 y- * ^- X n > x 

y y ic ii *d £ ft 5 o 

[ 0 0 3 2 ] 

»i&rfii7 7b^asif t j;<ae,titi^„ iff n ^ m r d* fg tc mi ii -r -s u-ty 40 

?EOl>T.C'JX h&tf!B3tt, S^#t39J6ftT^S#<© : PJfTl«J*^^S«l4:^-Z?# 
[ 0 0 3 3 ] 
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■r - ^ /I/ « 3g ti , 5>J 4 0 2fCl#^*ftS*-y-y h©^-f7 > (i:j;«3^{b$4i:Tfe<fc^o x 50 
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Tf-7;H OO^i'i'J-tStilCj;!), 7f -txtfpfffiTfes. 
[ 0 0 3 5 ] 
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^6n§o C C fc L fc ?ij it . y-^^-7.{c#Sft#51f $g<D 2 y£^T feO 
T*$9, #8fPj*K£*3fc©i:*S?2ftS^*T?ttfc^o 
[ 0 0 3 6 ] 

7- - y ;i/ 4 2 o k a , f- - y ;i/ 4 o ot^itLT^ssnt^s^-yy r- ©it m 20 

Snti/^. Htcfl* ico^tf-^^-x^^xU-ti! ;: i »), r-7/H 
2 OOESKTf •fe7t5i:i:*«Tt5. 7-7;P 4 2 0 O»ISIi, o 2£ "SI 
(Enzyme) " tS?nT^5^-y-y h«tO^Tf-7Jl/ 4 0 0 4^l'J-t5 

[ 0 0 3 7 ] 

T-7/M 2 O CD yij 4 2 1 E It f$ M £ $ ft T !/-> § , C©lfi!l£,ti-r-7;b4 O O © 

?|J4 0 1 © * - y «y UCfet$tlT^5o 4 2 2{±#ilS©*'<'yfc:|IIf3ltl8;&-^ 

T* V> 5 o 5>J 4 2 3 ti: " * © ffi © M 3 If *H " t £ ft T ft D , a - -if' 7 = 7 £ - y > X R 

tf»?f7^ yy*£#trffi©»iRl»?BK:7**xL;fc^i§£*££j£;t;T, £ 6 £ s 5>J 

#7-7;l'4 2 0fc£ita£ft3^tgtt#fc3C££^LT^3o 30 
[ 0 0 3 8 ] 

* - y y h©^^ytcj;5^?^-y-y n*«'N©7*-tex*»iiji-t3fci&fc:, f - y ^ 

4 1 0SO* 4 2 0 O*%^LfcA ? > C © 'J b — ->3t;l/f - £ ^ - X X 7 .MC fi , -r — 
^^-Xtc1$fflpIfi6*^?^-y>yh^-y-yh©^f'y©^(cfSDT, ?6^5t-7 

;p * ii *d -f § c t & v z % o 

[ 0 0 3 9 ] 

x-^^-xc-sB^ffi^-rs^^^wieaitcfi, eux-tf, bum, sittio^* = x 

ft5>©1fil(4, TfrM© F D Aiij7tAA'5, X«BBJ*U»* * U 7 LafrofcHtoE 40 
WT SXiKStfTUfritofrfctiS C i: tfTt 5 0 ^7^-^ Olftli Lt> #14, LD 5 
o, L D 5 „ /.E D s o > M 'ft , H 14 7J ^ X A , SttO^-y y h fc4 f > 

tr FD§iM'y7'j-, 7 # h - -> x & 5g tt , 4 «s ^ m m m it (/ut7^Yn*UT^ 

) , KiR It, iMIPl, «PBftiRtt, ttHMtt. KI?jfoiR*^/<* 

k, *mm. ftm<D%m. ttm<onm&* m^m© * , rcak £g«B, 
sis, ffittftwam, amftaiis, £2#aj«3. ^y6ntffl, ^j»a©}t 
^, K^tTKiT 5 ©^^, Mffffl» gf*Q©±g&*-y >y k ^©ffl©^-y>y hm/* 

XrA,&d : eEa©U-try^ifflS^ffl^fetf?)ft5o 
[ 0 0 4 0 ] 

03&, ±fE© 5 < 03^©4«*W**/<5 - * *#tfr-y^ 5 0 0 Zijkto t 50 
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- fJl 5 0 0 8 , £ 1 © 7* - * ^ - X rt # L f# * £ T © lb £ ft fc H » * 3 N ffl © fir ( 
1 ~N) 4>648, 9>j 5 0 1 tttfk-&«r«^dSn, ?'J 5 0 2 fc t± ( rfJ IRJgX t£ tSU* * 
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Alcit^il^titltl^. 0>J (f , f-^^ 5 0 0 %f-7;l/ 3 0 0 iili#^5 
[ 0 0 41] 

H 3 tt 3: fc , x-*^-Xrt©#^#-y-y McMLfc^ft^ftlifS^^-^^ty 
f - - 7 >\< 6 0 O^^to f 2©T-?^-Xrtt#fiLilitO?-y'V h k i» a -r § 
r-7;HOOtt, P ffi © ft ( 1 ~ P ) A^S5„ ?IJ 6 0 1 « * - y <y h £ # # $ ti , 10 
^IJ 6 0 2 tc ti (miEJlX«K»%^ U 7 Lftj^ofe*1*£ , o^T©) & ffi © JB * * 

n , ?y 6 o 3 t a m ft t n -r s it fe $ n , w 6 o 4 1 ^ n ft m *c m -r 5. it n s n 

T^S, ± IB PI « (C 0>J A fcf , 6 0 0 ^f-^;M 00 tMiftttSC ttiO 

-7'*5ooso , 6ootttu: 1 H^-y^ns, its® mm, st>*&#?*-y 

a ^;I/-r-i?^-^'>Xx Arttc^fi LffSfb^ftRtffti^-y <y h ©£&#££>#£ 
©KHfcj&ffi-rscfcft^ r-:7;l/ 5 00&tf6 00 Kiijto©?J£j!)n*SCi:tf-et£ 

20 

[ 0 0 4 2 ] 

<g£ttfigffi©ft£ 

*58W©±g£*J»fcJ:» {b£ft. hStff lk¥ntt«»64:«IROifRa 

y h ©«J£ £ , fcft*fiff»?if-yy HOtS^tt, *-©fiS©ffi5ft 
/B tc o T © Ifffi T* 3 0 C©»6£tt;XttK£14{cH*31&f8£M&»©£Bl¥W«Si 

c J: T* 1 1 o * hb © a s 4 fli a ti , f b & ft s a' # ? 2 - y -y h m © e is m & o - s l 

fci6© pJbEM 0 x- * -t -y h ^jfftt L . ^->*f^RO'*)!)f^BSO 

=F * - y >y hXtt^f^-yy h -tr y Kc»LT««-b7 RS/ctfriSft^S^A^i 

ta«f x ^ 'j 1 1 t , ^^ttf-?^i)St^ 0 Loafifx^ u £ft^ 

W->XrAXIj4»f«f i*>y h tit £fflteKKt£ifi^»filcfc^TSFffi;*ftfc» # 
fb^-ftRtf&Hz;/*, ^ © ffi © # ? * - y >y Mc *t ^ 3 £ IS tt © # » £ M 31 ft & <: £ 
fet'tS. 0J;U;£. &S{b£fttCO^T, U-b^^RtfCtiJcWJC-rsWJIjt^fljIBOte 
it g 1 0" 5 M (1 O^Jta^jl) x il3 0%t-fX htSc i:^T*f § 

0 c©ffiK:, SD^iiaxti fflS^Uffl^S^f £> <: £ 1$T* £ 3 . Sfc, *fgB^co l 

fstc&i^Tfi, coiuo^ix/z-fx h tc t5 t n it * s it 5 ^ ^ pi m * e c ? it 
^ico^t, ^^PIfi%25iCTXht§o c ft e> ©jSttfb^fttc o^r, Hitf, 

1 0~ 5 ~1 0" 9 M©®B(*I©7~1 4 0S&5Ig$ttf-IOgt^frffXh^ 40 

ffl/V 1f^©U-b7 p ^(c*5{t?.ffittfb-&ft©l C 5 o Stf/Xli K i©fl£ft£-f5 0 c 

~i o~ 9 M&om^Ri±i&^m&®m*mm? % c ttf&^t&z c £$>$>%>r£?>o 0 
afii©vh'j^xAM#e,n§o 

[ 0 0 4 3 ] 

cti6©x^u-=.>y7*-**4)sr«fc46K, fb^i»*aaftis»iSfcfe^T 

nI?§fb-r^o igii^ti. <5'Jx.(f4%DMSO^:if%fflt^5(:i:^T't?)A ,t , ^©^©iifi 
ODMSOXliftOii^ffl^^CttT'f S„ SIC, Ctl5>©t^ft<DX h 77lSffii& 
m^&mfelcftmb? . HUSK (r epos i t o ry) ilLTfiJfflT't^i^tC-r?. 50 
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c it a Walk I? ft? $ — V s y h H © *1 5 ft ffl £ $'] ^ 1" 3 © & 7 y -b J £ b lc ^ I^IM 

mrxs ^m\zm%. ^> tc & <d t. k 0 coi^a^T^t'fSiSi^y, ;i/ - ^ y it l x - 
$> 3 0 mS© 7R0 : IE«2 £*to1f$B££/£t 5 C £ T* t £ 6® 3 7 «y fc^ 7 * 

# *r , « * ffl n ft «■ ft # #f , ft ^ ^ 5 * -b > x , u v m. ft , jt fe # «t a £\ a ^ < © 7 * 

[ 0 0 4 4 ] 

* » w © 1 m m m x a . u -t 7 * is £ 7 -y -t -r x t* v * s 'ft 7 >y -t? % m ^ t # ? t§ s ft 
ffltB-r*7*-***fiK-r*o.«iitfcr, u * 7 2 is £ 7 >y -t -y -e tt , » be $ % © fb £ » * ■ 10 

U-b7^tt, «l«lXtiABIOffl«i«: 6 f# 3 C t # T? # , * -i> W± , U -fe 7 

yt-fOH!^*!*, « * tf , l^7*£#MBS7^^>3>£LTffllc1-§Ci:tf 
f^o H*7^tt, £fc> S»WK»B*tlT^Tfei^, #HR ffl © {fc£*X tt 'J *f > 
KJ4, f^©Ut7^lc*tt§ifiWSROVXttH£(D^^(cSo*^tIlR?n§<D!!) s 
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BACKGROUND OF TIIE INVENTION 
The present invention relates generally to a combination of orwmoinformatics and 
5 bioinformatic!) and data on chemical-molecular target interactions to create multi- 
dimensional databases. More particularly, this invention relates to databases com prising 
chemical compound, molecular target and biological or clinical information in which 
patterns or relationships of interactions between chemical compounds and molecular 
targets arc determined and compared with other information in the database in order to 

1 0 draw conclusions that are use ful for drug; discovery and development and for related areas . 
The worldwide pharmaceutical industry spends more than $30 billion a year on 
research and development, of which nearly one-third is spent on I he discovery and early 
development phase, which is the period leading up to the selection of a drug candidate for 
preclinical und clinical development. Some critical steps in dreg discovery include (1 > 

IS sequencing DNA comprising segments of the human genome; (2) identification of genes 
within the genome thai are associated with specific diseases or biological functions: (3) 
production of a protein such as a receptor or enzyme that corresponds to, or is encoded by, 
the functional gene and which then becomes a biological ur molecular target for drug 
discovery; <4) screening a library of chemical compounds lor activity against the 

20 molecular target {high throughput screening); (5) screening the most potent ocuve 

compounds against otLter biological targets (particularly other receptor* or enzymes) to 
assess the compounds' selectivity or specificity for the intended biologkal/mokcular target 
and potential to cause undesirable side effects through activity at other targets; (6) 
evaluating the most potent and selective compounds for their activity in a range of other 

25 assays designed to measure such properties as toxicity, rthsorption, distribution, 
metabolism, excretion, etc.; (7) assessing the most rtrormsing compounds bnscd on 
empirical judgments using the above information, and then sending that information tn a 
chemical synthesis group to produce analogs (or modified but related chemical structures) 
of the initial active compounds; (8) retesting the chemical analogs through Steps (4), (5) 

30 and then repeating Step (7) until an optimized lead compound or series of compounds 
is identified; and (9) forwarding the optimized lead compounds to further preclinical and 
clinical testing. 
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Throughout this process of discovery and development compounds go through 
successively narrower filters, and compounds are eventually selected for the mare 
expensive phases of preclinical and clinical development. Unfortunately, the selection 
process often leads to preclinical testing and clinical tasting of compounds thai will fail at 
5 these stages and never reach commercialization. These failures lead to extremely high 
average costs, estimated to exceed 5300 million, to develop and launch a new drug. I£ 
however, the optimal drug candidate b correctly identified early in the discovery and 
development process and successfully passes preclinical and clinical testing, toe actual 
cost to develop thai drug may be reduced by as much as 75%. Clearly, a major goal of 

1 0 pharmaceutical R&D should be to enhance the predictabi I ity of early drug development 
tests such as outlined above. 

With the revolution of new techniques in biotechnology- and the evolution of tools 
to automate many laboratory processes, two dominant trends have emerged in recent years 
that are having an important impact on pharmaceutical R&D. First, the number of 

1 5 molecular targets (such as new receptors and enzymes) available for discovery screening 
programs continues to increase dramatically due to progress in sequencing the human 
genome. About 400 molecular targets have been explored for drug discovery; estimates of 
the number of potenual molecular targets that may be elucidated fmm the human genome 
project range in the thousands to more than 1 0 000. Second, the size of chemical 

20 compound libraries available for discovery screening programs has expanded nearly ten - 
fold (to more than a million compounds in many drug companies) due to automation and 
new technologies such as combinatorial chemistry. These two factors hold tremendous 
promise for new drug discovery, but they also create significant potenual problems having 
adverse consequences on the cost of drug development. More targets and more 

25 compounds will result in many more trioacuvc compounds being discovered, leading to 
greater difficulty in selecting the optimal drug candidates to advance to preclinical testing, 
as well as increased development costs due to more compounds entering preclinical and 
clinical testing and potentially more failures at these stages. 

These factors point to an increased need for rapid, inexpensive, in vitro ("tcst-tobe" 

30 or mi croplate- based) assays for leadcornnound selection, optimization, and validation. 
Such rapid assays may help identify the most promising of these active compounds before 
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they enter the later, more expensive stages of drug development. These factors further 
point to a need for more effective methods to manage and interpret the vast amount of data 
on genes and gene products (molecular targets), chemical structures, and screening results. 
One application of in vilrv assays that is paining increased importance in 
5 pharmaceutical R&D is "profiling." The assignee of this patent application pioneered the 
concept of profiling in the late 1 9R0 s Drug companies are provided with an 
extraordinarily broad array of in vitro assays for characterizing the pharmaceutical activity 
and the potential side effects of compounds under development as new drugs. Currently 
there are more than 200 different assays thai may be performed on a routine basis based on 

1 0 molecular targets, called receptors and enzymes, that play a key role in a wide range of 
human diseases, including those associated with central nervous system disorders, immune 
diseases, pain and inflammation, infectious diseases, cancer, metabolism or grout h factors, 
cardiovascular function, and the endocrine system. Pharmaceuticals accounting for more 
tlian one-half of the worldwide market function by interacting with cellular receptors. In 

1 5 addition, many side effects of pharmaceuticals are also mediated through (heir interactions 
with receptors or enzymes. 

Through profiling, a drug company's lead computing, generally those entering 
preclinical development, arc tested in a hancry of receptor and enzyme assays. 
Information from the profiling process about interactions between the drug company's 

20 compound and certain receptors is important for the process of lead compound 

optimization and selection and can suggest possible side effects or secondary therapeutic 
activities of the compound. This knowledge can potential ly save the drug company 
million?* of dollars in wasted time and expen.se during preclinical and/or clinical 
development of the compound. 

25 While profiling sen-ices have been practiced for many years, the data generated 

from these tests are generally used empirically by drug companies. Most drugs, even 
highly selective drugs, interact with numerous receptors or other molecular targets. 
Interpreting data produced by profiling, therefore, depends oo the experience and 
knowledge of the scientist from the drug company who reviews the data on both the 

30 chemical structure of the compounds and the binding interactions of the compounds with 
specific receptors. Unfomuialely, even the most experienced pharmacologist lias an 
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incomplete knowledge of the interaction of different drug compounds with the broad range* 
of receptors relevant to drug development 

The need for more effective methods to manage, collate, interpret, and utilize the 
vast amount of data on genes and gene products (molecular targets), chemical structures, 
5 and screening results has led to the creation of new opportunities in moinformatics and 
chenwirrformatics. or managing biological and chemical data. The stages of generating 
large pools of information for drug discovery can be broken down into (1) DMA sequencer 
(code of genetic material or genes that are blueprints for the cell to make gene products or 
proteins); (2) functional genomics (process of conversion of DNA sequences to expression 
1 0 of corresponding gene products or proteins via mRN A production, especially in response 
ro drugs or changes in biological function); 

(3) proieomics Odeirafication of the amino acid sequence and/or ihree-dimensional 
structure of gene products or proteins, such as receptors, for which the genes code): 

(4) small molecule pharmacology/toxicoJogy (molecular binding or interactions between 
1 5 gene products, like receptors, and small organic chemicals that arc potential drugs); and 

(5) chemical structure (of small molecule, drug-like compounds). 

Databases for DNA sequences (Group 1) are well established and include 
GenBank, The Genome Center, and others. Similarly, databases of chemical structures 
(Group 5) are well known and provided by vendors such as MDL (Isis) and Oxford 

20 Molecular. Databases for proteomics (Group 3 \ such as S WISS-PKOT. ProLink, and 
PDB. arc also being established. Cadi of these databases can be considered as one- 
component, in that they contain structural information and can be used to determine 
patterns in that one dimension or single component of structural or sequence information. 
Databases for Groups 2 and 4 arc not well established, but should be valuable additions u» 

25 the information pool for drug discovery and dcvclopmcnl. these latter two forms of 
datasets would be two-component or two-dimensional m that they would contain da:a 
relating to the interaction between two slniclures, such as tjenes «o prmrin* (Orovn? 1 ! and 
proteins to chemicals (Group 4). Such relationship databases add n significant icvH of 
. complexity compared with the one-component databases. 

30 Partial databases or datascts for Group 4 relationships have been or are IwinL' 

establ ished. • For example, profiles of the binding of single compounds again* : a hro;»i «trt 
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or receptor targets by the assignee for its clients is a partial daiasct for Group 4 -type 
databases. Similarly, data generated through high throughput screening projects in which 
thousands to hundreds of thousands of chemical*, such as might be contained in a 
chemical structure database (Croup 5), arc screened for activity against a specific receptor 
5 target <a single point in □ Group 3 database), would represent a partial Group 4 database. 
Although such partial Group 4 datascts will he helpful aids for drug discovery and 
development, they suffer from two major drawbacks. First, they arc directed toward 
specific two-component analyses, such as the binding selectivity of a single compound or 
limited set of compounds across a range of receptors (profile) or of many compounds at 

1 0 one receptor target (high throughput screening). In both cases, the breadth of the dalaset is 
^sufficient to allow statistical correlations to be drawn among a multiplicity of receptor 
targets and a multiplicity of chemical structures. Second, and importantly, these partial 
datasets are being generated on chemical compounds selected for their structural noveliy 
and therefore proprietary potential as new drugs. Since these arc novel compounds, there 

1 5 does not exist any biological information ubout the activity of these compounds in animals 
or humans. Such approaches therefore suffer the same limitations as the pharmacologist 
trying lo empirically ralcrprct the data of a profile, as described above, 
SUMMARY OF THE INVENTION 
Accordingly, it is un object of the present invention to meet the foregoing needs bv 

20 providing systems and methods for analyzing data relevant to drug discovery and 
development. A ruINrank screening database including positive and negative data 
resulting from a large number of chemical compounds being tested against a large number 
of molecular targets is provided. The number of combinations of chemical eumpounds 
and molecular targets must be large enough such that a person of ordinary skill in the art of 

25 statistical or other data mining methods can use the screening dBtahasc together with the 

corresponding chemical compound database and molecular target database lo produce a 

reliable prediction of which clieraical compounds arc suitable for clinical testing and have 

an enhanced probability to be safe and effective drugs. 

Specifically, systems and methods for meeting the foregoing needs are disclosed. 

30 The system includes a computer system comprising u first database containing records 

t 

corresponding to a plurality of chemical compounds and records corresponding to 
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biologic*! information related to effects of the plurality of chemical compounds on 
biological systems of humans or animals, and a second database containing records 
corresponding to a plurality of molecular targets. The computer system further comprises 
a third database containing records corresponding to tests of binding, rcactiviry. or other 
5 interactions between compounds in ibe first database and molecular targets in the second 
database, the tests including information oo the effect that a compound from the plurality 
of compounds in the first database has on the interaction between a selected compound 
(e.g.. a reference agent or standard) known to interact with a specific molecular target from 
among the plurality of molecular targets, said tests being performed for a plurality of the 

10 molecular targets in the second database- Means for setting an interaction test threshold 
corresponding to said effect and means for selecting the compound, sets of compounds, 
and/or information associated with such rompound<s> when the results of the resting of the 
effect meet the interaction test threshold arc also included in the computer system. A user 
interface is provided to allow a user to view and manipulate or analyze information from 

1 5 the first database, the second database, and the third database as it relates to one or more 
compound records in the first database and/or as it relates to one or more molecular target 
records in the second database, especially with respect to compounds, moJecular targets, or 
other database records associated with results thai meet the interaction test thresholds). 

Furthermore, the invention relates to using methods of statistical analysis and other 

20 data raining methods as applied to these multidimensional databases to determine 
correlations or patterns that are relevant to drug discovery and development. 

Doth the foregoing genera) description and the following detailed description 
provide examples and explanations only. They do not restrict the claimed invention. 

25 DESCRIPTION OF THE DRAWINGS 

The accumpanying drawings, which are incorporated in and const itute a part of this 
specification, illustrate embodiments of the invention and, together with the description, 
explain the advantages and principles of the invention. 

Fig. 1 A illustrates a chemical compound table in the receptor selectivity mapping 
50 database according to one cmbodtmenl of the present invention; 
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Fig. ID illustrates a snap-shot of a chemical compound record containing spatial 
coordinates of a compound in the receptor selectivity mapping database according to one 
embodiment of the present invention; 

Fig. 2 illustrates several logical tables tlul may be used to access the molecular 
5 target informal) on in the receptor selectivity mapping database according to one 
embodiment of the present invention; 

Fig. 3 illustrates a biological information table in the receptor selectivity mapping 
database according to one embodiment of the present invention; 

Fig. 4 illustrates the use of a receptor selectivity mapping database as part of a 
1 0 screening process according to one embodiment of the present invention; 

Fig. 5A illustrates the use of a receptor selectivity mapping database as part of a 
screening process to discover and select new compounds as potential new drug candidates 
for further development; 

Fig 5B illustrates the use of a receptor selectivity database as part of a screening 
1 5 process to identify new targets as potential validated targets to use to discover new drug 
candidates for specific, disease indications; 

Fig. t>A illustrates the use of a database for predicting the drug potential of a new 
compound; and 

Fig, 6B illustrates the use of a database to validate the disease relevance and/or the 
20 biological function of a new molecular target. 

DETAILED DESCRIPTION 
Reference -will now be made to preferred embodiments of this invention, examples 
of which are shown in the accompanying drawings and will be obvious from the 
description of the invention. In the drawings, the same reference numbers represent the 
25 same or similar elements in the different drawings whenever possible. 

Systems and methods consistent with the present invention allow the analysis of 
data relevant to drug discovery and development, for example, for predicting the potential 
of q new compound's suitability for progression to preclinical and clinical tests with an 
enhanced probability of becoming a safe or effective new drug. For purposes of the 
.30 following description, ihc systems and methods consistent with ihe present invention are 
described with respect co a relational database containing multiple main tables and with the 
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use of the binding between chemical compounds and molecular targets as a measurement 



5 The present invention relates to the novel design, construction, and application of a 

database relating information-rich chemicals, molecular targets, especial ry proteins or 
other macrornolecules. and biological activity of the chemicals. Furthermore, the present 
invention relates tn the primary use of known drugs and drug candidates thai have failed in 
clinical or preclinical trials as a source of the chemical library' for the database, together 

1 0 with preclinical 0 r clinical data generated for such chemicals describing their side effects, 
mechanism of action and other medically relevant data. The present invention further 
relates to determining the binding or other interactions between the chemicals and the 
molecular targets in the database, then using methods of relationship analysis and data 
mining to correlate patterns of these interactions with specific biological activities that are 

1 5 relevant to drug discovery and development, or with specific chemical structures, 
substructures, or other features of compounds exhibiting such interactions, or with 
biochemical, structural, or other features of molecular targets exhibiting such interactions. 
Examples of such data mining techniques can he found in the following references, which 
are incorporated by reference in then- entirety: 
20 a) Chen et etl.\ Recursive Partitioning Analysis of a Large Snuaure-ActivirA- Data 

Set Using Thjee-Dtmensional Descriptors, Journal of Chemical Information and Computer 
Sciences, October 1998; 

b) Hawkins et aL, Analysis of a Large Structure-Activity Data Set Using Recursive 
Partitioning. Quant. Struct.-Act. Relat, 16:296-302 (1997); 



d) Good ti or., in Review In Computational Chemistry, Upkowitz, K. B. t Boyd, D 
B. (eds.). VCH, New York. Vol. 7. pp 67-1 1 7 { 1996); 

e) Marshal d a/., in Cumpuiir-Assesseil Drug Design; ACS Symposium Scrica 
1 12: American Chemical Society: Washington, DC. 1979; pp 205-226; 



of the ratcractions between the two. The description should also be understood to apply in 
general for any database structure having multiple main components and to the 
measurement of any interactions between chemical compounds and molecular targets. 
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f) Moloc tt a!. , A three-dimensional structure artrviry relationships and biological 
receptor mapping, in Mathematics and Computational Concepts in Chemistry, Ellis 
Horwocd; Chichester, 1985: pp 225-251: 

g) Mayer et a/., A unique geometry of the active site of angiotnuin-convening- 
5 enzyme consistent with structure activity studies. J. Comput Aided Mol. Des., 1 :3-16. 

(1987); 

h) Sheridan et a!.. The ensemble approach to distance geometry: application to the 
nicotinic pharmacophonc, J. Med Chera. 29.K99-906 ( 1 986); 

i) Martin et aJ. , A fast new approach to pharmacophonc mapping and its 

1 0 application to dopaminergic and benzodiazepine agonists, J. Comput Aided Mol. Des.. 
7;83-l02(!993); 

j) Catalyst/Hypo Tutorial, version 2.0. BioCAJD Corp. Mountain View, CA. 1993 

k) Sprague, P. W.. Automaicd chemical hypothesis generation and database 
searching with Catalyst Pcrspcct Drug Discov. Des., 3:1-20 (l°95); 
1 5 I) Bnraum et a!. Identification of common functional configurations among 

molecules, J. Chem. Inf. D>mput. Sci., 1996, 36:563-71 (1996) 

m) UipHop Tutorial, version 2.3; Molecular Simulation Inc.; Sunnyvale. CA. 1995; 

n) Davie*, K. and Upinn. R.. 3D pharmacophore searching. Net. Sci.. 
(bttp.v?www.rietsTi.on3/Scier*ccO 
20 o) Oo lender, V. and Vcsterman. B., APEX 3D expert system for drug design, Net. 

Sci . (http:/yww.a\vod.com/riets<^^ ); 

p) Van Drie, J., Strategies for the determination of pharm3Cophoric 3D database 
queries, J. Comput. Aided Mol. Dcs., ! 1:39-52 0 997); 

q) Van Drie, J. and Nugent R . Addressing the challenges posed by combination 
25 chemistry: 3D databases, pharrnacophoit; recondition and beyond. SAR QSAR Environ. 
Re*., 9:1-21 (1998); 

r) Finn et ai, Pharmacophore discovery using the inductive logic programming 
progol. in Machine Uamw#. Special Issue an Applicaiiotu and Knowledge Discovery, 
KJuwtr Academic Publishers: Boston, 1998, pp 1-33: and 
30 5) Jain et at. . Compass: a shape-based machine learning tool for drug design. J . 

Comput. Aided Mol. Dcs., 8:635-52 (1994). 
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The background section suggests thai, contrary to standard operating procedures in 
the pharmaceutic*] industry, a Group 4 database should be established having more 
components than a two- component database, and that it should cover a substantial breadth 
of both receptor or enzanme targets and chemical compounds. By way of example, a three- 
5 component database would be created by first sdecring & broad sol of chemical compounds 
thai are rieh in information of direct relevance lo drug discovery and development The 
most relevant information is often obtained by actual experience of testing such chemical 
compounds in bunums through clinical trials and'or post-marketing surveillance or in 
aruxnals through preclinical testing. Other relevant biological iofonoation may come from 

10 natural products that demonstrate one or more observed h inactivities, as well as chemical 
reference standards that have been used in the industry to characterize the biology of 
receptors. Accordingly,, one embodiment of infonriauoc-rich chemical compounds 
selected for such a Group 4 database includes marketed pharmaceuticals, drugs that have 
failed in clinical or preclinical trials, bioactive natural products or natural extracts, and 

1 5 reference agents used for receptor binding assays. 

One may construct such a database using screening data obtained from the 
scientific literature. While this approach could yield partial datasets, it may have 
lirrulauons. First, literature references generally provide only positive information (c.# . 
reports of inhibition of binding of a specific compound to a specific receptor) nod not 

20 negative darn (efc , a lack u f inhibition of binding and therefore lack of activity). In 
determining useful comparisons of informarion. negative data can be as valuable as 
positive data. Furthermore, certain statistical analyses may not be applicable to dalosets 



reports of binding data for one compound against a receptor in one article vs. reports of 
25 binding data for a second compound at the same receptor may not be comparable becaus e 
of variations in the *ay the assay* were performed. Therefore, one embodiment for 
creation of a Group 4 three-component database would be to screen a broad array of 
compounds through a broad array of receptor or enzyme targets in order lo obtain 
consistent comparative results and ensure the collection of both positive and negative data. 



lack 



of both positive and negative data. Second, separate quantitative 



30 



The Chemical Compound Component: .Selection of Chemical 
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Libraries and lwch»ron of Ch emical Data 

The present invention relates to databases that contain, as one component, chemical 
compounds about which infbrmatiorj is known concerning biological activity relevant to • 
5 pharmaceutical research and development. The biological activity information may be 
included in the chemical compound database or tabic 

For example, these inlormation-ndt chemicals include: 

(a) Compounds mat arc pharmacological reference agents or reference standards 
for measuring the interaction or molecular binding between unknown chemical compounds 

10 and a specific molecular target, such as u receptor or enryme. Examples of such reference 
compounds include those compounds that arc used for characterizing binding interactions; 
between test compounds and molecular targets including receptors or enzymes. Other 
reference agents could include chemicals selected from Hie catalog of Research 
Biochcmicals Inc. (RBI), a unit of Sigma Aldrich Corp., and from other sources that arc 

1 5 well known in the industry. These pharmacological reference compounds often have been 
rested previously and/or marketed as pharmaceulicals or are natural products with 
characterized biological activity and therefore may overlap with compounds in the 
following three categories; 

(b) Compounds that arc known pharmaceuticals that are currently or have 

20 previously been marketed tor clinical use, and for which then; is a substantial amount of 
biological information available. These compounds are well-known ami hit listed in 
publications available from U.S. government agencies such as the Food and Drug 
Administration (FDA), as well as publications by private or non-profit organizations. One. 
such publication by a nun-profit organization is the United States Pharmacopeia] 

25 Convention Inc." s USP 01 Series, including Volume I. Drug Information for the Health 
Care ProfcsxianaK which is updated monthly by USP DI Vpdaic. As new drugs are 
approved for marketing, they would be included in this category. Marketed 
pharmaceuticals or drugs approved by the FDA or equivalent foreign regulator}' bodies arc 
a matter of public record so that one normally skilled in the art can easily identify chemical 

30 compounds that would be included in this category; 

(c) Compounds thai have been approx-ed for testing in humans, such as ctnnpounds 
thai had been granted 1ND (Investigational New Drug) status, as potential drugs Hut that 
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foiled to achieve sufficient efficacy or safety in clinical trials to gain approval from the 
FDA or otherwise did nol teach the status of marketed piarmxentical.s. Compounds in 
this ceicgory may also include those ccinpounds Oral have been approved by the FDA for 
commercialization but thai have later been withdrawn from the market These compounds 
5 also would hove a significam amount of biological information available and would be 
especially useful for purposes of this invention. The identity of foiled drugs can be 
obtained from numerous sources, including public announcements by drug and 
biotechnology companies, publications such as the "Pink Sheets/ and lists maintained by 
the FDA: end 

1 0 (d) Compounds thai arc obtained from natural sources such as plants, 

microorganisms, animals, etc , that exhibit biological activity. These nsxural produces may 
include toxins, amiraicrobial agents, behavioral modifiers, defensive agents, and other 
categories of compounds that provide information relevant in drug discovery and 
devehipment. The identity of natural products can be found in numerous publications, 

1 5 including but not 1 united to. the RBI catalog and Sigma Aidnch catalog of chemical 
compounds. 

For each compound included in the database, chemical structure, chemical 
formulae, physical-chemical characteristics, chemical apace coordinates or other chemical 
structure descriptors (*g. r Smiles codes), solubility, and other relevant data, to the extern 

20 such ^formation is available, are entered into fields in the database. Those skilled in the 
art would recognize other parameters that might be included. Chemicals can be organized 
by chemical structure rclatedness in rhe database or by other relationships. 

Fig. I A illustrates n ciKmical compound table 300 in a relational database system. 
The table 300 lists a number of chemiual compounds and includes records (row* | -N) of a 

25 number of compounds N. For each compound there may be a number of corresponding 
columns 301-307 containing information related to the conrpound. For example, in Fig. 
1 A column 301 contains the name of the compound: column W2 include me compound 
type (e.g.. compounds that have been approved for testing in bumans, ere); column 303 
includes ^formation related to the chemical structure, for example, a hyperlink that brings 

30 up a screen containing adrawing of the structure (see snap-shot 3 1 0 in Fig. 1 B); column 
304 includes the chemical formula for the compound; column 305 include* information 
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about the physical-chemical characteristics of the compound: column 306 includes 
chemical space coordinates of the compound; and column 307 includes solubility 
information of the com pound. 

Additional columns may he added io order to include other relevant data related to 
5 each chemical compound 301 listed in the table 300. These additional columns may 
include biological activity of ihe compound, Tendering the chemical compound database a 
two component database (sec also database 500). 

Fig. IB illustrates a snapshot 310 that may include information corresponding to a 
record in the tabic 300. For example, the chemical formula 304 of a compound may be 
10 included in ihc snapshot of the record to well as the compound's structure 303. * 



Tbc Molecular Target Component: Selection of Receptun, Enzyme*, and Other 
MolctHlar Targets and Inclusion of Mole cular Target Data 

Molecular targets such as receptors, enzymes, other proteins, nucleic acids. 

1 5 carbohydrates, and other macromniccules relevant lo drug diiwawcry and development, are 
representative of the second component of the databases aimprising this invention. In one 
embodiment of this invention, receptor? and enzymes are the principal molecular targets. 
Receptors mediate much of the molecular communication among cells and organs in the 
body. Enzymes often amplify such communications through, for example, secondary 

20 messenger systems and cell signaling pathways. 

Receptors include classical families of receptors such as dopamine receptors, 
serotonin receptors, opiate receptors, muscarinic receptors, adrenergic receptors, adenosine 
receptors, etc. These receptor groups include subtypes of the receptor type (such as 
dopamine- 1, dopamtne>2, dopamtne-3, dopaimne-4, and doparnine-5 receptors). Certain 

25 subtypes have further variations (such as dopamine 4.2, diiparnine 4.4. and dopamine 4.7) 
or can have different forms (such as dopamine 2 short and dopamine 2 long). Splice 
variants of receptors can also occur, as can mutations in the genes encoding specific 
receptors which might lead to a subset of a population that has a receptor with slightly 
Afferent binding affinity for drags or other compounds compared with the normal receptor 

30 type. Receptors can be grouped by family, supcrtamily, or subfamily. Some groupings 
include O-Proiein Coupled Receptors, 7 transmembrane receptors, nuclear receptors, etc. 
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Receptors can be grouped by the degree of homology of the DNA sequence of their 

corresponding genes. Receptors cam also be grouped by their amino acid sequence and 

related three-dimensional conformations Receptors can be classified by tbeir location of 

expression in tissues or across different cell types. 
5 Enzymes can include protease*, caxbohydrases, kinases, phosphatases, DNA- 

mndirying enzymes, transferases, P450's t and others known to those skilled in the art 

Other receptors, receptor sources, and corresponding assays are constantly being 

developed by the assignee to be added to the content or the database Additional receptors 

and receptor assays arc well known to tbosr skilled io the art. Lists and descriptions of 
1 0 certain receptors relevant to drug discovery and development can be found in numerous 

publications known to those skilled in the art These publications include the RBI 

Handbook of Receptor Classification and the HIPHAR receptor classification book. 

Furthermore, as new receptors and receptor subtypes are discovered, they can be added to 

the content of tbe database. 
15 Er^ymcs and enzyme assays are well known to those skilled in the art Lists and 

descriptions of certain receptors relevant to drug discovery and development can be found 

in numerous publications known to (hose skilled io the art. 

Fig. 2 ilmstrares tables 400, 4 10, and 420 forming part of a relational database 

system which may be used to access molecular target mfonratjort Table 400 lists the 
20 targets and includes records (rows I -M ) of a number of targets M. Column 401 lists the 

names of the target, while column 402 specifies the target type corresponding to each 

target name. 

Table structures may vary according to the target type specified in column 402. 
Table 4 1 0 includes ^formation about those targets listed in table 400 which ere classified 
23 as receptors. Records from table 4 1 0 may he accessed by querying the database for a 

particular receptor name. The receptor names found in table 4 1 0 may be accessed, in turn, 
by querying table 400 for those target names for which column 402 reads "Receptor." 

In table 410. column 4 1 I contains the name of the receptor, which is also the name 
or the target in column 401 in table 400; column 412 includes receptor family mformaiicm; 
30 column 4 1 3 includes receptor superfamily information; column 414 includes receptor 
subfamily iruwrnation: column 4 1 5 includes the information about the degree of 
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homology of the DNA sequence of corresponding genes; and column 416 includes 
information on amino acid sequence The amino acid sequence is one of a number of 
molecular descriptors that may be included in the database. Other molecular descriptors 
4 1 7, for example, could include hydropathy plots corresponding to the amino acid 
5 sequence. Because the molecular target database represented by tobies 400, 41 0, and 420 
includes target information and associated biological information related to the targets is 
included in the database (see table 600), this database may be considered a two-component 
database. The columns shown are illustrative of the types of information that may be 
included in the database and should not he construed as limiting the invention. 
1 0 Table 420 includes information about tho.se targets in table 400 tliai are classified 

as enzymes. Records from tabic 420 may be accessed by querying the database for a 
particular enzyme name, The enzyme names found in table 420 may be accessed, in turn, 
by querying table 400 for those target names for which the target type column 402 reads 
"Enzyme* 

1 5 In table 420, column 42 1 contains the name of the enzyme, which is also the name 

of the target in column 401 of table 400 and column 422 includes enzyme type 
information. Column 423 is labeled as "Other relevant information" and is included in the 
table for purposes of illustrating that additional columns may be added to table 420 
depending on other enzyme information that a user of ibe database might want to access, 

20 including amino acid sequence and molecular descriptors. 

Although only tables 41 0 and 420 are shown to describe the access of molecular 
target inlbrmaiion by wring the target type, additional tables may be added to the relational 
database system corresponding to the number of molecular target types available in the 
database. 

25 

The Biological Information Component: Selection of 
Bk>luglcsl/Clinirol Infor mation Parameters 

Biological information forming part of the database includes material that wyuld 
30 relate to side effects, mechanism of drug action, metabolism of a drug, toxicity, adsorption, 
distribution, urtd excretion, for example. This information is available on FDA -approved 
labels of marketed drugs, or from literature sources and publication* for drugs th.it Itave 
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failed in clinical trials. Examples of some specific parameters are toxicity. LDj* 
LDk/EDjo, teratogenicity, mechanism of toxicity, target organ for toxicity, in vitro toxicity 
battery, induction of apoptosis, bioavailability, absorption, blood-brain barrier, oral 
absorption, mucosal absorption. % absorbed, distribution, biood protein bound, half-life, 
5 onset ofoctioo, duration of action, peak concentration io blood, metabolism, major 
pathway, minoi pathway, active ntctaboHtes. excretion, primary excretion mode, 
secondary excretion modes, in vivo c fleets, therapeutic indication, animal behavioral 
effects, side effects, primary known target, other organ/system targets, and known receptor 
interactions. 

10 Fry. 3 shows table 500 which includes some of the bi ological ^formation 

parameters mentioned above. Table 500 comprises N rows (1 through K) which 
correspond to all the possible chemical compounds in the first database. Column 50 1 
includes the compound name: column 502 includes the therapeutic indication (for 
marketed or failed drugs): column 503 includes toxicity information; column 504 includes 

1 5 aide effects information; and column 505 includes information on the mechanism of drug 
action. Tahlc 500 would be associated with table 300, for example, to form a two- 
componcntcheniicsl compound and biological activity tabic. 

Fig. 3 also shows tabic 600, which includes biological information parameters 
associated with the molecular targets in die database. Table 600 includes P rows ( 1 

20 trucmghP) which eoriespofld to all the possible targets m the sccoi id database. Column 
601 includes the target name; column 602 includes the therapeutic indication (for marketed 
or failed drugs); column 603 includes toxicity information; and column 604 includes side 
effects irtformatiori. Similarly, table 600 wxild be associated with table 400, for example, 
to form a two-competent molecular target and biological activity table. Tables 500 and 

25 600 together may be n full-rank database (e.£. , including all possible combinations 
between compounds and molecular targets in a relational database system) including 
molecular target information, chemical compound information, and biological activity 
information associated with each of the molecular targets and with each of the chemical 
compounds, and may be considered a muludimensional database Additional columns 

30 may be included in tables 500 and 600 without departing from the invention 
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Determining Blgdigy IflfagM flfla 
A key feature of this invention is the establishment of several components of 
information which, by way of illustration, comprise chemicals, molecular targets, and 
biological information, and measuring the binding, reactivity or other interactions between 
5 the chemicals and molecular targets. This binding or reactivity information can then be 
related back to the known biological ioformatioa in order to distinguish patterns and 
relationship,*? that can be used for drug discovery and development. An iraponam aspect of 
this invention is to generate broad and consistent binding or reactivity data between the 
chemicals and molecular targets in order to provide as complete a dataset as possible in 

1 0 order to be able to identify relevant patterns or relationships and to provide both positive 
and negative binding or reactivity information lor the datascts. In one embodiment the 
binding data is established as a numerical descriptor that cither satisfies or docs not satisfy 
a threshold set, for example, for a specific molecular target or set of molecular targets. 
The numerical descriptor may relate to the activity or lack of activity for each compound 

1 5 and each receptor or other molecular target measured at a concentration deemed near the 
appropriate threshold for relevance to the biological system or biological information set 
For example, chemicals can be tested at Kr'M (10 micromolar) for their ability to inhibii 
binding at a threshold of 30% between a receptor and its specific reference compound. 
Other initial concentration* or percentage inhibition thresholds can be selected. Also, in 

20 one embodiment those chemicals that demonstrate inhibition of binding above the 
threshold in the initial yes/no testing are further tested for tbe potency of the binding 
inhibition. These active chemicals are tested at a series of concentrations that might for 
example, include tests at 7-14 different concentrations within the range of 1 0" 5 to 10"' M, 
such that an IC» and/or Ki value can be determined for the active compound at the specific 

25 receptor. Fewer or more concentrations may be used for such determinations and 
concentrations above or below 1 rr 5 to iQ*M may be required. These data then yield a 
matrix of relative degree of activity or relative potency for each active compound at each 
molecular target 

Id order to generate these screening data, chemicals arc first sotubilired in a 
30 suitable solvent system, such as 4% DMSO. although other concentrations of DMSO and 
other solvents are also acceptable. These chemical stock solutions arc then diluted to the 
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appropriate concentration and made available as repositories. For each assay measuring 
the interactions between the chemical and molecular target, the reagents and protocols for 
the assay will vary. Each such assay needs to he characterized and routinely cstabl [shed 
for consistency. Appropriate controls need to be nm each time the assay is performed 
5 Any assay forma! thai can generate the desired type and accuracy of tnfonnarion can be 
used. Numerous assay detection systems, such as radioactive labels, fluorescence, 
fluorescence polarization, lime- resolved fluorescence, fluorescence correlation 
spectroscopy, cranni luminescence, UV absorption, colorimclric, eic, can be used. 

In one embodiment, a receptor-bind assav or enzyme activity assay is used to 

1 0 generate data on molecular interactions. As an example, for a receptor binding assay, 
chemicals from a repository arc tested for their ability 10 mhibit the binding interaction 
between the receptor and a reference agent selected for thai receptor. The receptor may be 
derived from a tissue source, such as animal or human tissue, or from a cell line expressing 
the receptor, or from a transfected celt line containing the gene for the receptor. The 

1 5 receptor source is prepared for the assays, for example, by preparing a membrane fraction 
containing the receptor. Alternatively, the receptor may be partially purified. The 
reference compound, or ligand, is preferably selected for its potent and/or specific binding 
to the specific receptor und may have a radioactive tracer such as Iodine- 125 or rritium or 
carbon- 14 or other marker to enable a bound ligand to he distinguished from an unbound 

20 ligand. Coincident with testing the chemicals for binding data to include io the database, 
' positive and negative controls are nm. as is a reference curve with varying concentration* 
of the reference (radioligand to ensure the quality of the assay run. 

A plurality of methods and systems may measure the interact inns between targets 
and compounds as would be recognized by a person of ordinary skill. The radioligand, 

25 receptor preparation, and test compounds arc incubated together for an appropriate iirm\ in 
an appropriate buffer, and at an appropriate temperature, often with the objective of 
reaching equilibrium of the binding reactions. The amount of bound versus unbound 
radioligand is determined by a separation step, such as filtration, or by use of a method, 
such as SPA (scintillation proximity assay), and measured by liquid scintillation or gamma 

30 counting. The amount of specific binding of the test compound is then determined by 
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comparing assay results for the test chemicaKs) vs. the positive and negative controls, The 
per cent inhibition of the lest cherniceKs) u calculated from these data. 

Fig. 4 shows table 200 as an illustration of a screening results and assay database in 
which, for example, chemical compounds included in database 300 (comprising I to N 
5 chemical compounds) are tested for their effect against molecular targets included in 
database 400. Numerous forms of table 200 are possible. For example, in table 210 
screening results are entered in a "yes" or "no" entry with respect to whether the screening 
result for each of a plurality of chemical compounds tested against each of a plurality of 
molecular targets was above or below the selected threshold test result for each set of 

10 determinations. 

As another example, in tabic 220 screening results ore entered as a numerical 
descriptor identifying the potency or magnitude of the binding ox other effect (e.^.. the Ki 
for chemicul:rccepcor interactions) for each of a plurality of chemical compounds tested 
against each of a plurality of molecular targets. Id a preferred embodiment all such matrix 

1 5 points for chemicals x targets in tables 2 1 0 and 220 axe determined and entered into the 
database such that a full-rank dataset is derived. The screening results and assay database 
200 may also include other measurements of chemical :targct interactions, including raw 
data of screening results and measurements derived from the raw data, assay protocols and 
performance characteristics: and other relevant information. 

20 Figs. 5 A and SB illustrate the use of a database 100. here shown as a receptor 

selectivity database, by way of example, as part of a screening process to discover and 
select new compounds as potential new drug candidates for further development (rig. 5A> 
or new targets as potential validated targets to use to discover new drug candidates for 
specific disease indications (Fig. 511). The database 100 may include a chernical 

25 compound component 300; a molecular target component 400; biological information 
components 500 ami 600; and a screening results and assay database 200. 

A new compound or set of compounds is introduced to a screening process 102 for 
determining whether it is effective in inhibiting the binding of a specific chemical 
compound (c. », a reference ayent) and a molecular target (see Fig. 5 A). The screening 

30 process may use target Information from the molecular target component 400. 



-19- 



(42) 



JP 2004-500614 A 2004. 1.8 



WO 00*5421 PCT/US0OT1073 

The results of the screening process 102 may be stored in an irnermediaie database 
or entered into the screening results and assay database 200 of the receptor selectivity 
database 100. The resell* may also be stored in the biological informatieo database 500 as 
particular parameters cytotoxicity, etc.) as well as in the chemical compound 
5 database 300 (e.g. , name of the compound, etc.). 

The complete set of results from the screening process 102 may be stored in the 
screening results and assay database 200. The database 200 may be queried for those new 
compounds thai exhibit an inhibitory effect do the binding of molecular targets and 
chemical compounds (e.g., reference agent) so Out those new compounds can further be 
10 tested. 

Alternatively, a new molecular target, such as. for example, on "orphan" receptor 
about which the structure is known but the function or disease relevance is not known, is 
introduced to a screening process to be to he tested against the chemical compounds in the 
chemical compound database 300 (sec Fig. 5B). Results of the screetring process, 

1 5 including identification of chemicals thai interacted with (be new molecular larger, are 
incorporated into the screening results database 200. Queries are made within database 
1 00 lo determine further steps to Identify the function of the new molecular target and/or 
validate the disease relevance of the new target. 

Fig. 6A illustrates the use of the database 100 for predicting the drug potential of a 

20 new compound. A table 710 relies on information from the chemical compound (300), 
molecular target (400). biological information (500 and 600), and screening results (200) 
databases. ■ The table 71 0 is filled in with informaUon from one or more or these databases 
(or tables) by executing an automatic query script to retrieve the information once a user 
provides the database 100 with information about a new chemical compound. 

25 The query script used for the creation of table 71 0 may select chemical compounds 

from the chemical compound database 300 upon receiving the new compound iiiformarinn. 
The selection may be based on similar charTOieri^rjcs, <uch cheuaca! stricture or o!hcr 
pTopenies. between the new compound and the compounds already included in the 
database 300. 

30 After the selection of chemical compounds, the query script selects targets from ihe 

target database 400 that arc known to react (e.*., hind) with the selected compounds. 
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Finally, ibc combination of selected chemical compounds arid selected molecular targets 
may be used for querying the biological information databases 500 and 600 and inserting 
biological information corresponding to chemical compound-molecular target pairing* into 
table 7 10. Alternatively , the user may enter a specific biological information category of 
5 interest (e.g., toxicity) so that the biological informal ion included m table 7 1 0 is limited to 
that category. 

The table 710 may be queried by the user to produce information relevant to the 
predictability of the potential use of the new compound as a drug. An example of this 
would be a query of the molecular targets known to react with chemical compounds 
1 0 associated with (he new compound, and the known side effects produced by the ciiemical 
compounds when combined with the retrieved targets. 

Fig. 6B illustrates the use of the database 100 to validate the disease relevance 
and/or the biological function of a new molecular target using an approach similar to that 
used to predict die drug potential of a new compound, but with the data inputs and queries 
1 5 shown in Fig. 6D. 

All patent, patent applications, and publications mentioned are incorporated by 
reference in their entirety into this application. 

The foregoing description of embodiments of the present invention provides an 
exemplary illustration and description, but is nol im ended to be exhaustive or to limit the 
20 invention to the precise form disclosed. Modifications and variations are possible in light 
of the above teachings or may be acquired from practice of the invention. 
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WHAT IS CLAIMED IS: 

t. A computer system comprising: 

a first database containing records corresponding to a plurality of chemical 
compounds and records corresponding to biological inform ad on related to effects of such 
5 chemical compounds on biological systems; 

a second database containing records corresponding la a plurality of molecular 

targets; 

a third database containing records corresponding to tests of interactions between 
compounds in the first database and molecular targets in the second database, the tests 
1 0 including information on the effect that a compound from the plurality of compounds has 
on the interaction of a compound known to interact with a molecular target Gum the 
plurality of molecular targets and said molecular target; and 

a user interface allowing a user to view the selected compound and to selectively 
view udbimaiton from the first database, the second database, and the third database as it 
1 5 relates to a compound record in the first database or as it relates to a molecular target in the 
second database. 

2. The computer system of claim 1 , wherein the interaction includes binding and 
the effect includes inhibitory effect. 

3. The computer system of claim 1 . wherein the chemical compounds include 
20 compounds with no known biological activity or that have failed in tests. 

4. The computer system of claim I. wherein the chemical compounds include 
compounds tested in animals. 

5. The computer system of claim 1 , wherein the chemical compound* include 
compounds known to have an effect on the environment. 

25 6. The coropuler sysrero of claim 1 . wherein the chemical compounds include 

pharmacological reference agents. 

7. The computer system of claim 1, wherein the chemical compounds include 

known pharmaceuticals in the market for clinical use for which there is a substantial 

amount of biological information available. 
30 *• The computer system of claim I , wherein the chemical compounds include 

compounds approved for testing in humuos. 
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9. The computer system of claim 1, wherein the chemical compounds include 
compounds obtained from natural resources that exhibit biological activity. 

1 0. The computer system of chum I, wherein the molecular targets include 
receptors. 

5 II. The computer system of claim ! , wherein the molecular targets include 

enzymes. 

12. The computer system of claim I , wherein the molecular targets include nucleic 

acids. 

1 3. The computer system of claim 1 . wherein the molecular targets include 
(0 carbohydrates. 

14. The computer system of claim 1, wherein the records of the first database 
corresponding to a plurality of chemical compounds are organized in categories related to 
the description and properties of the compounds. 

1 5. The computer system of claim 14, wherein the categories include: 
1 3 compound name; 

compound type; 

physical-chemical characteristics; 

chemical space coordinates or structural descriptors; and 

solubility. 

20 1 6. The computer system of claim I. wherein the first database include* a natural 

product database. 

1 7. The computer system of claim 1. wherein the first database includes a failed 
drug database. 

1 S. The computer system of claim I » wherein the first database includes a chemical 
25 registry database. 

19. The computer system ofclaim 1 , wherein the second database includes a three- 
dimensional structure database. 

20. The computer system of claim 1 , wherein the second database includes a 
sequence/mutation database. 

30 21. The computer system of claim 1 . wherein die second database include* a 

genomic database. 
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22. The computer system of claim 1 , wterctn chi records in the third database 
corresponding to biological information related to the chemical compounds effects on the 
biological targets, are organized in categories thai include: 

compound oamc; 
5 target name; 

toxicity; 
side effects: and 
mechanism of drug action. 

23. The computer system of claim I further comprising means for setting an 
1 0 interaction test threshold corresponding to said effect and means for selecting the 

compound when its use results in a test meeting the interaction test threshold. 

24. A method for analyzing data relevant to drug discovery and development 
comprising: 

selecting chemical compounds from a first database containing records 
1 5 corresponding to a plurality of chemical compounds; 

selecting molecular targets from a second database containing records 
corresponding to a plurality of molecular targets; 

producing information corresponding to the interactions between each of the 
selected chemical compounds and each of Ok selected molecular targets 
20 selecting s biological activity from a third database contain i.-ig records 

corresponding to biological information related to effects of chemical coinpounds on 
biological targets; and 

using the produced information to correlate patterns of interactions between 
chemical compounds and molecular targets associated with the selected biological activity. 

25 

25. The method of claim 24. wherein the step of producing information includes 
the steps of: 

generating binding data of the binding between each of the selected chemical 
compounds and each of the selected molecular targets by monitoring the inhibitory effect 
30 that an unknown compound has on said binding; 

setting a bindingJcst threshold corresponding to the inlribitorv' effect; and 
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generating inrormution on the combination of unknown compound, molecular 
target, and chemical compound that meets or finis to meet the binding test threshold. 

26. 1 he method of claim 25, wherein the binding data comprises positive and 
negative binding information. 
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